Compiling Keyphrase Candidates for Scientific Literature Based on Wikipedia

نویسندگان

  • Hung-Hsuan Chen
  • Jian Wu
  • C. Lee Giles
چکیده

Keyphrase candidate compilation is a crucial step for both supervised and unsupervised keyphrase extractors. The traditional methods are usually based on the lexical or frequency properties of the phrases to come up the list. However, terms collected based on these properties do not always semantically meaningful. We show that Wikipedia can be a great auxiliary resource to compile meaningful keyphrase candidates for scientific literature. We conducted empirical experiments on digital libraries of two disciplines, namely Computer Science and Chemistry. The results suggest that Wikipedia has a good coverage of the two disciplines and has the potential to be applied to other scientific disciplines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents which utilizes Wikipedia as a thesaurus for candidate selection from documents’ content. We have devised a se...

متن کامل

SJTULTLAB: Chunk Based Method for Keyphrase Extraction

In this paper we present a chunk based keyphrase extraction method for scientific articles. Different from most previous systems, supervised machine learning algorithms are not used in our system. Instead, document structure information is used to remove unimportant contents; Chunk extraction and filtering is used to reduce the quantity of candidates;

متن کامل

DERIUNLP: A Context Based Approach to Automatic Keyphrase Extraction

The DERI UNLP team participated in the SemEval 2010 Task #5 with an unsupervised system that automatically extracts keyphrases from scientific articles. Our approach does not only consider a general description of a term to select keyphrase candidates but also context information in the form of “skill types”. Even though our system analyses only a limited set of candidates, it is still able to ...

متن کامل

Exploiting Description Knowledge for Keyphrase Extraction

Keyphrase extraction is essential for many IR and NLP tasks. Existing methods usually use the phrases of the document separately without distinguishing the potential semantic correlations among them, or other statistical features from knowledge bases such as WordNet and Wikipedia. However, the mutual semantic information between phrases is also important, and exploiting their correlations may p...

متن کامل

WikiRank: Improving Keyphrase Extraction Based on Background Knowledge

Keyphrase is an efficient representation of the main idea of documents. While background knowledge can provide valuable information about documents, they are rarely incorporated in keyphrase extraction methods. In this paper, we propose WikiRank, an unsupervised method for keyphrase extraction based on the background knowledge from Wikipedia. Firstly, we construct a semantic graph for the docum...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017